在本文中,我们研究了多块最小双重双层优化问题,其中上层是非凸线的最小值最小值目标,而下层级别是一个强烈的凸目标,并且有多个双重变量块和下层级别。问题。由于交织在一起的多块最小双重双重结构,每次迭代处的计算成本可能高高,尤其是在大量块中。为了应对这一挑战,我们提出了一种单循环随机随机算法,该算法需要在每次迭代时仅恒定数量的块进行更新。在对问题的一些温和假设下,我们建立了$ \ Mathcal {o}(1/\ Epsilon^4)$的样本复杂性,用于查找$ \ epsilon $ - 稳定点。这匹配了在一般无偏见的随机甲骨文模型下求解随机非convex优化的最佳复杂性。此外,我们在多任务深度AUC(ROC曲线下)最大化和多任务深度部分AUC最大化中提供了两种应用。实验结果验证了我们的理论,并证明了我们方法对数百个任务问题的有效性。
translated by 谷歌翻译
轻巧的超级分辨率(SR)模型因其在移动设备中的可用性而受到了极大的关注。许多努力采用网络量化来压缩SR模型。但是,当将SR模型定量为具有低成本层量化的超低精度(例如2位和3位)时,这些方法会遭受严重的性能降解。在本文中,我们确定性能下降来自于层的对称量化器与SR模型中高度不对称的激活分布之间的矛盾。这种差异导致量化水平上的浪费或重建图像中的细节损失。因此,我们提出了一种新型的激活量化器,称为动态双训练边界(DDTB),以适应激活的不对称性。具体而言,DDTB在:1)具有可训练上限和下限的层量化器中,以应对高度不对称的激活。 2)一个动态栅极控制器,可在运行时自适应地调整上和下限,以克服不同样品上的急剧变化的激活范围。为了减少额外的开销,将动态栅极控制器定量到2位,并仅应用于部分的一部分SR网络根据引入的动态强度。广泛的实验表明,我们的DDTB在超低精度方面表现出显着的性能提高。例如,当将EDSR量化为2位并将输出图像扩展为X4时,我们的DDTB在Urban100基准测试基准上实现了0.70dB PSNR的增加。代码位于\ url {https://github.com/zysxmu/ddtb}。
translated by 谷歌翻译
NDCG是标准化的折扣累积增益,是信息检索和机器学习中广泛使用的排名指标。但是,仍然缺乏最大化NDCG的有效且可证明的随机方法,尤其是对于深层模型。在本文中,我们提出了一种优化NDCG及其最高$ K $变体的原则方法。首先,我们制定了一个新颖的组成优化问题,以优化NDCG替代物,以及一个新型的双层构图优化问题,用于优化顶部$ K $ NDCG代理。然后,我们开发有效的随机算法,并为非凸目标提供可证明的收敛保证。与现有的NDCG优化方法不同,我们的算法量表的均量复杂性与迷你批量大小,而不是总项目的数量。为了提高深度学习的有效性,我们通过使用初始热身和停止梯度操作员进一步提出实用策略。多个数据集的实验结果表明,我们的方法在NDCG方面优于先前的排名方法。据我们所知,这是首次提出随机算法以优化具有可证明的收敛保证的NDCG。我们提出的方法在https://libauc.org/的libauc库中实现。
translated by 谷歌翻译
虽然训练后量化受到普及,但由于其逃避访问原始的完整培训数据集,但其性能差也源于此限制。为了减轻这种限制,在本文中,我们利用零击量化引入的合成数据与校准数据集,我们提出了一种细粒度的数据分布对准(FDDA)方法来提高训练后量化的性能。该方法基于我们在训练网络的深层观察到的批量归一化统计(BNS)的两个重要属性,即,阶级间分离和级别的含量。为了保留这种细粒度分布信息:1)我们计算校准数据集的每级BNS作为每个类的BNS中心,并提出了BNS集中丢失,以强制不同类的合成数据分布靠近其自己的中心。 2)我们将高斯噪声添加到中心中,以模仿压力,并提出BNS扭曲的损失,以强迫同一类的合成数据分布接近扭曲的中心。通过引入这两个细粒度的损失,我们的方法显示了在想象中心上的最先进的性能,特别是当第一层和最后一层也被量化为低比特时。我们的项目可在https://github.com/zysxmu/fdda获得。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Although deep learning has made remarkable progress in processing various types of data such as images, text and speech, they are known to be susceptible to adversarial perturbations: perturbations specifically designed and added to the input to make the target model produce erroneous output. Most of the existing studies on generating adversarial perturbations attempt to perturb the entire input indiscriminately. In this paper, we propose ExploreADV, a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks, allowing users to explore various kinds of adversarial examples as needed. We adapt and combine two existing boundary attack methods, DeepFool and Brendel\&Bethge Attack, and propose a mask-constrained adversarial attack system, which generates minimal adversarial perturbations under the pixel-level constraints, namely ``mask-constraints''. We study different ways of generating such mask-constraints considering the variance and importance of the input features, and show that our adversarial attack system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks. We demonstrate our system to be effective based on extensive experiments and user study.
translated by 谷歌翻译
Recently the deep learning has shown its advantage in representation learning and clustering for time series data. Despite the considerable progress, the existing deep time series clustering approaches mostly seek to train the deep neural network by some instance reconstruction based or cluster distribution based objective, which, however, lack the ability to exploit the sample-wise (or augmentation-wise) contrastive information or even the higher-level (e.g., cluster-level) contrastiveness for learning discriminative and clustering-friendly representations. In light of this, this paper presents a deep temporal contrastive clustering (DTCC) approach, which for the first time, to our knowledge, incorporates the contrastive learning paradigm into the deep time series clustering research. Specifically, with two parallel views generated from the original time series and their augmentations, we utilize two identical auto-encoders to learn the corresponding representations, and in the meantime perform the cluster distribution learning by incorporating a k-means objective. Further, two levels of contrastive learning are simultaneously enforced to capture the instance-level and cluster-level contrastive information, respectively. With the reconstruction loss of the auto-encoder, the cluster distribution loss, and the two levels of contrastive losses jointly optimized, the network architecture is trained in a self-supervised manner and the clustering result can thereby be obtained. Experiments on a variety of time series datasets demonstrate the superiority of our DTCC approach over the state-of-the-art.
translated by 谷歌翻译
Accurate and smooth global navigation satellite system (GNSS) positioning for pedestrians in urban canyons is still a challenge due to the multipath effects and the non-light-of-sight (NLOS) receptions caused by the reflections from surrounding buildings. The recently developed factor graph optimization (FGO) based GNSS positioning method opened a new window for improving urban GNSS positioning by effectively exploiting the measurement redundancy from the historical information to resist the outlier measurements. Unfortunately, the FGO-based GNSS standalone positioning is still challenged in highly urbanized areas. As an extension of the previous FGO-based GNSS positioning method, this paper exploits the potential of the pedestrian dead reckoning (PDR) model in FGO to improve the GNSS standalone positioning performance in urban canyons. Specifically, the relative motion of the pedestrian is estimated based on the raw acceleration measurements from the onboard smartphone inertial measurement unit (IMU) via the PDR algorithm. Then the raw GNSS pseudorange, Doppler measurements, and relative motion from PDR are integrated using the FGO. Given the context of pedestrian navigation with a small acceleration most of the time, a novel soft motion model is proposed to smooth the states involved in the factor graph model. The effectiveness of the proposed method is verified step-by-step through two datasets collected in dense urban canyons of Hong Kong using smartphone-level GNSS receivers. The comparison between the conventional extended Kalman filter, several existing methods, and FGO-based integration is presented. The results reveal that the existing FGO-based GNSS standalone positioning is highly complementary to the PDR's relative motion estimation. Both improved positioning accuracy and trajectory smoothness are obtained with the help of the proposed method.
translated by 谷歌翻译
In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features. Existing methods extract part features in an explicit manner, by either using a hand-designed image division or keypoints obtained with external visual systems. In this work, we propose to learn Discriminative implicit Parts (DiPs) which are decoupled from explicit body parts. Therefore, DiPs can learn to extract any discriminative features that can benefit in distinguishing identities, which is beyond predefined body parts (such as accessories). Moreover, we propose a novel implicit position to give a geometric interpretation for each DiP. The implicit position can also serve as a learning signal to encourage DiPs to be more position-equivariant with the identity in the image. Lastly, a set of attributes and auxiliary losses are introduced to further improve the learning of DiPs. Extensive experiments show that the proposed method achieves state-of-the-art performance on multiple person ReID benchmarks.
translated by 谷歌翻译
We are introducing a multi-scale predictive model for video prediction here, whose design is inspired by the "Predictive Coding" theories and "Coarse to Fine" approach. As a predictive coding model, it is updated by a combination of bottom-up and top-down information flows, which is different from traditional bottom-up training style. Its advantage is to reduce the dependence on input information and improve its ability to predict and generate images. Importantly, we achieve with a multi-scale approach -- higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). This is different from the traditional predictive coding framework in which higher level predict the activity of neurons in lower level. To improve the predictive ability, we integrate an encoder-decoder network in the LSTM architecture and share the final encoded high-level semantic information between different levels. Additionally, since the output of each network level is an RGB image, a smaller LSTM hidden state can be used to retain and update the only necessary hidden information, avoiding being mapped to an overly discrete and complex space. In this way, we can reduce the difficulty of prediction and the computational overhead. Finally, we further explore the training strategies, to address the instability in adversarial training and mismatch between training and testing in long-term prediction. Code is available at https://github.com/Ling-CF/MSPN.
translated by 谷歌翻译